DCCUMENT RISUME 

TM 002 «65 

Offenberg, EObert M, 

Evolution of a Bilingual Evaluation* 

Philadelphia School District^ pa. Office of Research 
and Kvaluation. 
Feb 73 

17p.; Paper presented at the annual meeting of the 
American Educational Research Associatlont February^ 
1973 

MF-$0.65 HC'$3*29 

♦Bilingual Education; Criterion Referenced Tests| 
Elementary Grades i ^Evaluation Methods | Forced Choice 
Technique I Formative Evaluationi Interviewsi Program 
Descriptions; *Prograin Evaluation; Research Design; 
Speeches; Standardiged Tests; Suminatlve Evaluation 



Evaluation of ongoing educational programs must 
necessarily differ from the basic research design; it must change to 
meet the changes of the program and its environment. Over the three 
years of the operation of the Philadelphia 'Let's Be Amigos" 
bilingual program^ the kinds of data generated in the program 
evaluation have evolved in response to the demands of project 
management^ community and intra-school-system relations and the 
Office of Education, The evaluation of process aspects and product 
aspects of the program have evolved in opposite .directions: (1) 
evaluation of the pupil performance program outcomes has tended to 
evolve from informal # criterion^referent approaches to more rigorous 
experimental designs; and (2) evaluatibn of processes has tended to 
evolve from formal methods (observational checklists^ forced-choice 
questionnaires) to less rigorous methods (open-ended questionnaires^ 
interviews, etc,) , In the first operational yearst assessment of 
pupils* reading was primarily criterion-referent ^ involving a 
word-calling test. The assessment of reading skills was modified 
after first^year evaluation, first passing through a phase in which 
an attempt was made to prepare materials^derived^ criterion-referent 
tests to assess more complex skills, and from there to standardized 
tests. Evaluation of curriculum development has evolved from use of a 
formal checklist to use of an interview structure with open-ended 
questions* (KM) 



ED 07« 102 

AUTHCR 
TITLE 

INSTITUTION 

PUB DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



EKLC 



FILMED FROM BEST AVAILABLE COPY 



mm 



EVOLUTION OF A BILXNGUAL EVALtJATlON 



py 



Robert Oftenberg 
ResearGh Associate 



Rsad at the 1973 Convention American Eduaation Research Association 



OFFICE OF RESEARCH AtTD EVALUATION 
Division of Instructional Research and Development 

Edward K* Brown 
Director 



February 1973 



Evoluuion Qf tha evaluat ion oC n nilin-ml pro/ir-i- 

- - - - - - ^ — - . ' ■ - - - _i 

The classical axpsrimantal design is a static dasiign. A ser of 
hyp^-heses is conjectured, and a series of ohservations or rm^u\^nmntB 
is made which either confirm or deny the validity of these hypotheses • 
One goal is always to complete the experiment and say '*aya" or ''nay" 
at the end. A change in the goals of the experimant should never occur 
midstream, but only after the data has provided insight into ^he truth 
of th2 original hypotheses. 

This models the bASic research model, is perfect for assessing the 
iinpaGt of phenomena like, the agricultural experiments and laboratory 
experinients for which most designs and statistical analyses were developed i 
phenomena which can be isolated from evolutionary forces like the ones which 
iinpinge on educational programs from outside, or which develc^ within them. 

In contrast to a classical experiment is one which is subject to 
these forces for change, The Philadelphia's Let's Ba.Araigos bilingual 
program is one of ttie first group of projects which required compre- 
hensive evaluation of both project out comes and prdject processes under 
Federal guidelines for program accountability developed for Title VII 
and Vltl of the Elementary and Secondary Education Act. Over- the three 
years that the project has been operational, the kinds of data generated 
in the program evaluation have evolved in response to the demandi of 
project management, demands of community and intra-sehool-system relations 
and the demands of the funding agency, the Office of Education. Thm 
pattern of this evolution seems ayatematic enough to warrant the hypothesis 
that evaluation of other programs may evolva in similar ways. 

The evaluation of process aspects and prDduct aspects of tiie program 
appeared to evolve in opposite directions as the project matured i (1) 
Evaluation of the pupil performance program outcomes has tended to evolve 

ERIC 



ERIC 



from informal, criterion-r«if.rent approaches to more rigorou. .Kp.r^..nt«l ' 
designs and (2) .valuation' of proce.sc.. h^. ^,adad ^volva f^om fornmi 
methods (obs«rvacional checklists, forced choice quaationnaires) to lass 
rigorous methods (open=ended qu^^stionnaires , int«vxews, «tc.). In this 
paper, i win dascriie the changes occurring in tha asftesan,ent of one aspect 
Of each Of thesa two types of avaluation. 'Ae assassm.nt; of roading in the 
Model school component which swves alementary schoDl English^ and Spanish- 
spsaking children is tha first. The assesament of the curriculum development 
process for the project as a whole is the second. 

one subject area where there was tha clear evolution of product 
. evaluation from soft approaches to rigorous ones was reading, m the first 
operational years, assessment of pupils- was primarily criterion=referent. 
A word-calling test was developed in whioh performance could be assessed 
directly in terms of the materials presented. Pupils were asked to "oall" 
a sample of the words appearing in the reading series in the order in whioh 
they were presented. A sample of the data gathered in the first grade in • 
this way in Figure 1. it shows the percent of pupils at mid^year who could 
call each of the words in the preprimer. evaluation plan was to compare 

the pupil performance with a criterion, and decisions were made on the basis 
of the outcome. The criterion was that the average word would be recognised 
by 801 of the pupils. As was reported, in an earlier paper (Offenberg, 1971), 
results obtained were below expected levels for Spanish speakers reading as 
Spanish reading text, but at expected levels for English speakers reading 
an English reader. Because the data gathered in this study were directly 
tied to curriculum materials, they were very useful for program modification. 
The program personnel found that, because there was a graater-than^necessaty 
level of re-entry of materials, the taxt used in Spanish was too long for 
the time allotted to reading instruction in tha program. 



•• The finding led to d modificacion of zr.^ u^. of niat^^:ial^»-on.. atoxi^^ 
in Which material was virtually all -eview W4.re onuttod, i: oraar to spaed 
up the rate of acquisition of new skills. 

The assessment of reading aki:i.s was modified after fir^t-yaar evalua- 
tion, first passing through a phase in which an attempt was made to propare 
mattrials-derived, criterion-referent tests to assass n.ora aomplex skills, 
and rrom there to standardiaed tearing (Offeiiberg 1972 dnd 1S73). 

In the first of these phases, instrunients were developad to assess 
more complex skills than the word calling exarninad in the first year. 
The instrumenti attempted to measure the childran's abilities to (a)" read 
and unaerstand single words through the matching of words to pictures with 
pictures .(see Figure 2) and (b) to r^ad and understand paragraphs (see Figure 3) 
TO assure that the initrument was a good reflection of skills being taught in- 
the program, they were developed by axperianced teachers and supervisors of 
the project. They prepared items closely related to contents of the texts. 

At first it seamed easy to develop '-criterion'- instruments in the formats 
shown in the figures. However, when the instrufflents were used and the results 
analysed, problems emerged, ^e tests lacked the qualities which they needed.- 
to be useful tools. Examination of the item analyses showed that there was 
little relationship between anticipated difficulties of J.tems and the number 
of pupils ^0 correchly completed them. Secondly, the expacted relation- 
ships betwean pupils ^sucaess with the items and total scores was not found. 
Most items, had correlations near zero and a fair nui^er had negative corre- 
lations witii the total score. 

I was considering a major overhaul of the reading test pack^^^^^ 
it beoama apparent that the need for a crit«ion-referant approach was 
being supplanted by a greater need for program eutceme asssssraent which had 



ERIC 



greater mesning outaida the context of pro>Cu than had the cri^erion^ 
referent approach. It was now zne thiru opa^rational yc^ar of the program, 
and the third year of funding of biimguai programs under Title VII* 
I received a phone call from the project director who wcis^ m Washington. 
The call brought home to us for the first time realizations that despite 
the Office of Education's emphasis on cri-cerion^referent apnroaches and 
evaluations designed to be uaoful for project modification, change to a 
more traditional eKparimental design and instrumentation would be needed.' 

In the phone call the proj-act director said that the Office of 
Education was preparing for testimony for Congress, and wanted to know 
if we had any concrete data showing gains or growth brought about by the 
project, vnien I provided criterion -referent data showing that, within 
the limits imposed by the state of our tests' development, the pupils 
could master most of the skills that project planners claimed tiTey couldj 
the person on the other side of the phone said ttiat Congress did not 
care about meetings of expectancies —they wanted a number — an amount of 
gain—which they could use to show that bilingual ecSucation was "better" 
than the education which it replaced. Further discussion with the project 
director indicated that not only the Federal Government, but also other 
meirfeers of. the school comnunity wanted some "hard" data, leading to the 
decision to overhaul the evaluation of performance ' in the reading skills 
area. 

The shifting of evaluation to a more classical experimental approach 
brought with it a major problem. Implied in every classleal psychologic 
cal or edueational research design is some baseline or comparison 
behavior to which the treatment group's behavior can be compared. Thm 
laboratory ideal for this comparison is a ■ randomly selected control group* 
This rarely can take place in ah educacional setting/ espfebially with 



ERIC 



a program as politieally potent m bili-gu^,. programi which hel^ ouz thu 
promise greatly ii^rovqd academic growth for S^^v:i&h-^p^^.inq children, 
and which involved the coinmunity in its planning. Impl^rncAT:ation of a 
classical design required an answer to tiie question, ''Who^a child would 
be in the control group?'* 

Fortuitously, the Qvaluation being conducteri in Philadijlphia was 
being conducted by a division of the school system itself, and hence 
the system could provide some background for a rGasonataly T-ight "quasi- 
ej^erimen'tal" design. 

An unpublished study was conducted in 1968, in which all "Spanish 
Speaking" children in the city of Philadilphia were tested in their 
mother tongue with standardized tests in reading. The children tested 
in this group either migrated' from a Spanish speaking area, or ware 
children of parents who had migrated* This group provided some "base- 
line" of pre^prograin performance, against which current performance of 
Spanish-speaking children could be judged. The regular city-mde testing 
prograni provided baselines for English speaking children. 

The testing procedure used with the baseline group in 1968 was 
replicated in the program/ The results (See Figure 4} show that per^ 
formance of Spanish^speaking pupils was substantially greater than that 
of the baseline ? suggesting that the program had enhanced performance of 
Spanish-speaking pupils. Smaller gains were obtained for English-speak- 
ing pupils , 

Many of you will recqgnize tiie process which I have described as an 
evolution from a formative evaluation approach to a sumnmtive one. The 
forces which led to ttie evolution was a change from an approach which pro- 
vided Information for project management, to a need to demonstrate the 
the ^value of the program i;o people outside it at &e expense of providing 
rich product^evaluation data for program mo^^ 



in GontrasM to tha produce evaluation # exarninauion ot tno procassas 
always served project management functions. Izb evaluation seems to have 
moyed from ^ormal, structured approaches to more informal ones. The first 
year's eval lation of program processes was geared to dewiloping clear cut 
statements of. jntendad program proGessas^-^success criteria of program 
management. In the subsequent years ^ process evaluation has become geared 
to answering questions of "how," and "why^" in order to gain- insight into 
program operation, and to gather meaningful ways of correcting problems* 
The early approach was to develop and use high^f ace^validity questionnaires 
and checklists which were completed by project personnel expert judges or= 
members of the target groups, An exair^le of tiiis approadi was the durriculmri 
Development checklist. Data from this instrument is shown in Figure 5, The 
Curriculum Development Checklist embodied the Griteria which the project 
director's staff had set to determine the degree to which developed materials 
met the internal standards of the project* Once theEe standards had been 
specified the coordinator of curriculum development used it to review the 
work of his subordinates^ and guide him in the upgrading of the curriculum 
preparation process* The data that was produced was to some extent a bi^product 
Of the instruments main function/ CDntrol of the curriculum preparation process. 
However, the critical element was not how the instrument's data was used^ but 
ttat it came from an instrument based on expectancies which were clear enough 
to be put into questionnaire items which could ^ for toe most part, be answered 
by a ■'yes'' or "no" comment* . ' . / 

In contrasti Figure 2 shows thm instrument used to gain insight into 
the problems of the curriculum distribution proGess, it was developed in the 
thrid operational year. It is a structura for an interview which was conducted 



by members of the resgarch staff in order to deuarinine how tha materials 
distribution and tryout could be improved. As can be sgisn, items in it 
are open-ended. The instrument served mainly as a guide to assure that 
the correct topics are discussed, but makes few assmTiptions as to what 
the responses will be, in the course of analysing the data response^ 
categories were developed to turn the reactions obtained inro- countables 
which could be easily summariaed. In contrast to the previous instrimient 
these categories could not be developed until after the answers had been given. 
Despite the looseness of the data gathering process^ the recommendations 
which came from the interviews were pointed. It was found that (a) teachars 
needed more specific course outlines and schedules which showed them the • 
specific topics and materials to teach; and (b) the project management 
needed to clarify their roles and clarify the functions of the curriculum 
centers of the project, ' ' 

To summariEe^ it appears that^ unlike the classical experiment, which ' 
should be carried to its logical conclusion^ evaluation of on going/ 
educational programs must change and adapt to meet the changes of the prograni 
■and the anvironment in which they are embedded. The eKperienc^ of the 
Let -5 Be Amigos program suggests that the evaluation of pupil outcomes will 
become more formal and ''experiment- like" and evaluation of process will 
become more open and informal* 



EKLC 



- 8 - 




ERIC 




ntf?n { cortn v i do 





Figure 2, Sample Items from the "Pictures " Criterion Test, Spanish 
Version (2nd. and Third Grade) 




Kate walked slowly back zo her house, 
in ner nouce, sne ;-oi: corr.e rc-er and 
node a sign. Tr\er: s-e zzok ine sign 
rnd went beck into u:-^:8 strRet. 



!/ - 4. . 

is a u e 



e . sure e 



wa 



k r= 



i I U u. o tj 



Q 



n 



A big tiger was behind the rocks. 

mean- looking tiger. 
He looked ready to jump. 



"Come on" , I ye I led. 
Let^s get out of here", 
other woy 



"I -m scared! 
We ran the 



I he boys saw a mean- i cok i ng 



q i raf f e 



gett i nci 



u I aer 



Pigura 3, Sample Items from the "Paragraph*' Criterion Test, English 
Version (SeGond-Third Grade) . 



- ComnAri>yon or Modcii r^c 






ana Th: 


.rd-^n%aa Pupils' 




With ":h^5 Bjis© Ij 


ine of 




pii 


^ in Phi 


.ladtfli^hii Schools 






Grade *^nd SuDuQist 




/lit. T,i 






^■o»ift Is A i R 


















Hocond Gr^J^i*-* k 






n 


A Percentile 




Recognition of 














Words and Letters 43 


,04 


35 


'15 


,27 • 


48.70 45: 


16 , 20- 


Word Meaning 8 


.56 


' 30 .. 


4 


.85 


■ 11 ,61 40 


6,59 


ComprGhension 6 




36 " 


4 


. 95 


7. 28 40 


7 , 75 


Composita 57 


.49 


32 


20 


,49 


i57 , So 45 


, 41 




Bai 


se Line 






Models A & B 






(N 


- 33;;) 






{n - 94) 




Third Grade :< 


Percentile 


SD 


K Peroentile 


SD 


ReCQgnitLian of 














Words and Letters 49, 


.76/ 


32 


1 S 


,25 


58,39 61 


8,48 


Word Meaning 9i 


.39 


19 


8. 


.08. 


14,25 36 


6*51 


Comprah^jnsion 7, 


.19 


21 


7. 


.13 


10,14 . 30 


6-95 


CDmposita 69, 




27 


57. 


.94 


82,95 44 


19,37 




Multivariate Analysis of 


Variance 




Grade Level t 


F 






df 


. P< 




Multivariate 


13.81 






4/795 


' *001 




Recognition of Words 














and Letters 


49.37 






1/79S 


,001 




iCord Meaning 


4.41 






1/798 


,04 . 




Comprehension 


9.39 






i/798 


.002 




Composite 


18,00 






1/798 


,001 




Program 














Multivariata 


20.88 






4/795 


, ,001 ^ 




, Recognition of Words 














and Latters 


39,12 






1/795 


,001 




Word Meaning 


51,06 






1/798 


,001 V' 




-Comprehension 


. 14,93 






1/798 


.001 : 




Composite 


12. 29 






1/798 


*001 

. _ _ 




Interaotion of Grade ^Lavel 














and Program 














\ Multivariate: 


1,47 






' 4/795 






Recognition of Words 














and Letters 


1,72 






1/793 


. NS 




Word Maaning 


2.66 






1/798 


NS ^ 




Comprehension 


2,66 






1/793 


NS • 




. : ^ Composite 








1/798 


: KB 




^Pe^o^ntileg ara for sacond- j 


and^ third- 




1, pupils in Puerto Hiooi 



in th«3 Spring aemester» 



Figure .4. : Sample of tha dat^ and analyses usad in tha "Quasi«EKpsrimantal" 
.desian phase. of Reading Testing. 



Siammary of sup«irvisor * s ratings , on project-deveioped c^itaria 
of matariais. Gospleted^ this year for fiva curr^cular unii:s/ 



Muirber of Unito Rated 



Criterion vei No Not Applicable 



1. Appropriate for intended 5 "0 0 
grade levels* ■ 

2. Appropriate for students' 5 0 . 0 
cultural background, 

intaraat level, and 
ej^erientlal field, 

3. Appropriate for students' ' 2 0 3 
previous knowledge in the 

eubject matter field* 

4. Specific objectives 1 4 0 
clearly stated , 

5. Sequential organization 4 0 1 
- and structure • ■ 

_6^:,r;0bserv^le performance i '^4 " 0 

/ f . OutcoTOS stated. , 

7 1 vEeason^le » variety of . 5 0 0 

41#arning activities, 

8 * fevaluation proeedurea " 3 . 2 O J 

. included. . '\. . 

9* Provision for individual . 5 . 0 0 

rata of learning included, . ■ ^ 

lOv Teacher guida including 2 3 0 

; suggested claBsroom pro^ 

cedures , , 



11* Availability of equipment, 1 4 

12, Mds^ materials needed .1 - 2 

to teach unit specified, 
and where obtainable. 



0. 



Figure 5. Sample of the data gathered using the Curricului^ Development 
Gheaklist. 



^izl& VZl B±lingU44 program 

Research and Evaluation 

Foreign LangMagas 

Structured Interview of Taachars Using 
pr^ram Developed Units. 
P^rt I Curricuium Distribumon 



Idantif ication i 
School 



Teacher »i Name 



.Orada Laval taught 

Interviawar 

Date 



Find out which project davelppe^ materials the teacher is uaing in mam 
subject that he teaches* ^ 

iSSE Title Author 



Find out from whom tha teacher got the materials 

Nota: (If the teacher does not mantion suparyisor, the school itaelf and the 
QurriGulum eentar, ask specifically about them) . 



Note I (If it is not yet clear ^ find out whether the teachar knows about the 
^rriculiuivDevalc^ment Center at 219 Broad, Richard , and 



the Materials Center at Potter-^Thomas) ^ 



Figure 6. Sample of the opanended type of ques"cicnn^;ir<^ used in 
evaluation of currieulum= distribution., . , . ^^^^ ■ ^ 



Did the teacher request any materials? if so, what did -chay ask foe, 
Ti*OB did they ask, and did they gat them? _______ 



How can we i^ro^ the distribution of materials in ganerai for next yaatf?f'^^i 



Next year we would like to eKamine pupil parfo^noe on soma of the mftterial* 
which have been written for use in the project. How can we distriJoute 
those materials, and what kind of support can we giv% to assure that thev ■ ' 
get a fair teial? 



A^tiiing else about eurrlculum jnatarials dlstriBution, that we aheaulil knew? 



Figure 6 (Part 2). Sample of the open ended questionnaira used in 

evaluation of curriculum distribution* 



Offenberg/ R. Impact of accountability on the davelopment of a 
bilingual program* Education^ Vol. 93 #1^ p. 78* 



Offenbarg, R, Title VII Project Let's Be Ami go s i ivaluauion of the 
First Year. Philadelphia ^ The School District of Phila- 
delphia. 1971, Reprinted by ERIC/ Document ED 046295, 

Offenberg, Title VII Project Let*s Be Amigos: Evaluation of the 

Second Year. ^Philadelphia s The School District of Phila- 
delphia, 1972, ERIC in Press, 

Offenberg, R. Title VII Project Let*s Be Amigos i Evaluation of the 
T^ird Year, Philadelphiai The School District of Phila- 
delphia, 1973* 



= 16 - 



